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Exponential Smoothing (ES) as a forecasting technique has been exten- 
sively used since its introduction in the 1960s. It is simple, hence easy to 
implement, and in many cases performs surprisingly well. However, many 
phenomena require a more sophisticated forecasting technique. In this paper 
we introduce a new forecasting technique, Adaptive Gradient Exponential 
Smoothing (AGES). This technique extends the classical ES as used on simple 
data or on data with linear trend. For data with both linear trend and seasonal 
effects this extension results in a new and more general form of ES, which is 
presented in this paper. The new forecasting technique is tested on simulated 
data and some real data of the types mentioned above, and its performance in 
all these tests is clearly superior to ES. It is shown by analysis and by the 
experimentations that for certain types of data it does in fact converge to the 
optimal (in the mean square error sense) forecasts. 

I. INTRODUCTION 

The need for quick and reliable forecasts of various time series is 
often encountered in economic and business situations. In the Bell 
System, forecasting is used to help plan trunk and facilities for the 
telephone network, 1 " 3 as well as to project computer workload, to 
determine staffing levels for operators or service observers, and more. 

Many forecasting techniques exist and different time series may 
require different techniques. In general, there is a clear trade-off 
between simplicity (resulting in cheaper implementation) and per- 
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formance of the forecasting technique. One of the simplest forecasting 
techniques, Exponential Smoothing (ES), has surprisingly good per- 
formance. This technique was presented originally by Winters 4 and 
Brown 5 and is described briefly in Section II. In Ref. 6 the optimality 
properties of ES are studied and we expand on these studies and use 
the conclusions as the basis for a new technique we introduce here. 

In fact, these studies revealed a relationship between the ES and 
the Autoregressive-Integrated Moving Average (ARIMA) model-fit- 
ting-based forecasting suggested by Box and Jenkins. 7 This is further 
discussed in Section III. 

The extensive use of ES clearly indicated that for time series with 
nonstationary discontinuities or changes in the generating parameters, 
ES performance is not satisfactory. This prompted a number of 
researchers to develop the Adaptive Exponential Smoothing (AES) 
idea. In these techniques the algorithm is supposedly evaluating its 
own performance and correcting its parameters to obtain improved 
performance. Recently, the existing AESs (see, for example, Refs. 8 
through 11) were reviewed critically by Ekern. 12 One of the points 
raised in Ref. 12 was that none of the existing AESs is supported by 
analysis or general performance claims (e.g., optimality). In addition, 
it should be pointed out that only Roberts' and Reed's AES 11 can be 
used on data with both linear trend and seasonal effects, while the 
other AESs are limited to simpler data and have no natural generali- 
zation. 

In this paper we present a new AES algorithm, which we call 
Adaptive Gradient Exponential Smoothing (AGES). This technique 
naturally generalizes to data with both linear trend and seasonal effect. 
In addition, analysis of AGES for simple data and extensive simula- 
tions, using simple as well as more general data, strongly suggests that 
this technique converges to optimal performance in the mean square 
error (MSE) sense. 

Section II presents ES as commonly used. A new, more general form 
is developed with a discussion of its optimal properties. The new 
technique, AGES, is derived and presented in Section III, while the 
results of experiments with this technique on both real and simulated 
data are presented in Section IV. 

II. EXPONENTIAL SMOOTHING AND ITS OPTIMAL PROPERTIES 

First we consider ES as Winters 4 did for three types of data: simple* 
(S), with linear trend (LT), and with both linear trend and multi- 
plicative seasonal effects (LSM). Common to all the configurations is 



* Simple data are of the form a + n{t), where a is a fixed value and n(t) is noise with 
zero mean. 
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the following: a time series \x(t)\ is measured every time interval T 
(e.g., hour, day, or week), and t is an integer representing the time tT. 
Then, one is interested in forecasting the value x(t + 1)* based on the 
data available up to and including t, namely x(0), x(l), • • • , x(t). 

If x{t + 1) denotes the forecast, carried out at time t for x(t + 1), 
from Ref. 4 we have (using our own notation for consistency with the 
discussions in the sequel), for S data: 

x(t + 1) = ax(t) + (1 - a)x(t) (la) 

< a < 1; (lb) 
for LT data: 

x(t + 1) = d(t) + b(t) (2a) 

a(t) = ax(t) + (1 - a)[d{t - 1) + b(t - 1)] (2b) 

b(t) = 0[a(t) - d(t - 1)] + (1 - 0)$(t - 1) (2c) 

< a, < 1; (2d) 

and for LSM data: 

x(t + 1) = (a(t) + b(t))c(t - L + 1) (3a) 

a(t) = a * (t) + (1 - a)[d(t - 1) + 6(t - 1)] (3b) 

c\t — L) 

b(t) = 0[a(t) - a(t - 1)] + (1 - P)6(t - 1) (3c) 

x(t) 
c(t) = 7 tt-( + (1 - y)c(t - L) (3d) 

a(t) 

< «, j8, 7 =£ 1, (3e) 

where L is the known periodicity of the season. 

In all the equations above, the parameters a, /3, and y are called the 
"smoothing coefficients". 

Our first step is to rewrite eq. (1) and, more importantly, eq. (2). 
This provides the basis for a new form of ES for LSM data, more 
general than (3). The new form, which is a natural extension of (1) 
and (2), suggests types of data for which the ES algorithm can result 
in optimal (in the MSE sense) performance. 

Equation (1) can be readily rewritten as 

x(t + 1) = d 1 x(t) + (1 - h)x(t), (4a) 



*Note that we restrict our discussions to one-interval-ahead forecasting with the 
understanding that it can be generalized to more time intervals ahead. 
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where clearly 

0i = 1 - a. (4b) 

With some algebra one can show that eq. (2) is equivalent to 
x{t + 1) = Oix(t) + kx(t - 1) 

+ (2 - h)x(t) - (1 + 6 2 )x(t - 1), (5a) 

where 

0! = 2 - a(l + 0) (5b) 

02 = a - 1. (5c) 

The basic difference between (2) and (5) is that (5) reflects the 
assumption that the noise-free part of the data x(t) is generated by 
the difference equation 

y(t) - 2y(t - 1) + y(t - 2) = 0, (6) 

while (2) reflects the assumption that the solution of (6) is 

y(t) = a + bt. (7) 

[Note that in (2) a(t) is the current estimate of 'a + bt' and b(t) is the 
current estimate of l b.'] 

The ES as given in (3) for LSM data is clearly based on the 
assumption that the noise-free part of the data has the form 

y(t) = (a + bt)c(t), (8a) 

where 

c(t + L) = c(t). (8b) 

The difference equation satisfied by (8) is 

y(t) - 2y(t -L)+ y(t - 2L) = 0, (9) 

and the corresponding ES 

m m 

x(t+ 1) = £ hx(t-j + l) - 1 OjX(t-j + l) 

;=i y-i 

+ 2x(t -L + l)-x(t-2L + 1). (10) 

The parameters 6 Jt j = 1, ■•• , M and the constraints they have to 
satisfy are discussed later. Also, the claimed correspondence between 
(9) and (10) will become more apparent in later discussion. 

At this point, however, we emphasize that while (7) is the general 
solution of (6), and thus (2) and (5) are equivalent, (8) is only one of 
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many possible solutions of (9). Hence (10) represents an ES form that 
is more general than (3). 

Similarly, data with linear trend and additive seasonal effects* 
(LSA) have the underlying difference equation 

y(t) - y(t - 1) - y(t - L) + y(t - L - 1) = (11) 

and the corresponding ES is 

M M 

x(t + 1) = £ Bjx(t -j + D-1 0jx(t -j+1) 

+ x(t) + x(t-L + l)- x(t - L). (12) 

To unify and simplify the discussions ahead we introduce the 
following notation. Let D be a unit delay operator, namely Dx(t) = 
x(t — 1), and let A(D) be a polynomial in D such that 

f 1 for S data 

2 - D for LT data 

2D 1 - 1 - D 5 "- 1 for LSM data 

1 + D L ~ l - D L for LSA data. 



A(D)=< 



(13) 



With these definitions (4), (5), (10), and (12) can be unified as 

x(t + 1) = £ djD^HHt) - x(t)) + A(D)x(t), (14) 

where M = 1 will result in (4) and M = 2 in (5). 

It should also be pointed out that the ES as given by eq(s). (1) [(2) 
or (3)] has an implicit assumption in it. The assumption is that one 
(two or three) coefficient(s) can, in fact, smoothen the data. In other 
words, M in (14) is equal to one (two or three). However, its general 
form, (14), allows for a larger number of coefficients to get better 
approximations. 

To observe the optimal properties of the ES forecasts we define the 
forecast error as 

e(t) = x(t) - x(t) (15) 

and use as our criteria for the forecast quality the mean square error 
(MSE), i.e., E[e 2 (t)\. With this in mind, it is clear that optimal 
performance is achieved if the e(t) becomes a white noise sequence 
(i.e., independent and identically distributed with zero mean). Namely, 
the ES technique, while assuming knowledge of the generating process 
for the noise-free component of the data, attempts to "whiten" the 



*Thi8 type of data was not addressed in Ref. 4 and, as far as we know, no form of 
ES applicable to it was proposed before the one here. 
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noise component. This attempt implies an underlying assumption that 
the data are generated through, or at least approximated by, the 
process 



K!H 



[1 - DA(D)]x(t) = 1 - 2 djD J | e(t), (16) 

where e(t) is a white noise with variance a 2 . 
Substituting (14) and (16) into (15) results in 



M 

1 - £ 6jD j 



-I 1 -!* 



e(t) = 1 - I BjD 



e(t). (17) 



This equation satisfied by e(t) is the basis of our claims for correspond- 
ence between eqs. (9) and (10), and (11) and (12). Equation (17) 
immediately suggests the conditions for optimal forecasting. First, to 
get bounded MSE one must require: 

Condition 1: All zeros of the polynomial [1 - X^i fyW] are outside 
the unit circle. 

If, in addition, we also require: 

Condition 2: 0, = 0,, ; = 1, 2, • • • , M, 

then, clearly, from eq. (17), e(t) will converge to e{t) and optimal 
forecasting (in the MSE sense) is achieved. 

Remark 1: As we discussed here, the sufficiency of Conditions 1 and 
2 is quite obvious; however, they are also necessary. This 
is argued in Appendix A. 

Remark 2: In Ref. 4 a and for LT data are restricted to interval 
[0, 1], which corresponds to the set S 2 in Fig. 1. The actual 
constraints follow from applying Condition 1 to the M = 2 
case. This results in the set Si in Fig. 1, which clearly 
contains S 2 and is considerably larger. Allowing for a larger 
constraint set for X and 2 (or, correspondingly, a, /?) will 
result in more cases for which ES could result in optimal 
performance. 

III. ADAPTIVE GRADIENT EXPONENTIAL SMOOTHING 

In the previous section we argued that for data that can be approx- 
imated by (16), forecasting with ES of the form (14) can result in 
optimal performance in the MSE sense. To achieve this, Conditions 1 
and 2 must be satisfied. However, while Condition 1 can be satisfied 
by proper choice of 0,, Condition 2 is, in general, hard to satisfy since 
the values of 0, in eq. (16) are not known. Basically, the ARIMA 
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Fig. 1 — The constraint sets: 

S, = |(0,, 2 ): 1 2 1 < 1, 02 + 0i < 1, fc - 0i < 1) 

S 2 = |(0,, 2 ): 0, = 2 - a(l + 0), 2 = a - 1, < a, < 1). 

model-fitting-based forecasting 7 deals with exactly this type of prob- 
lem. The 0/s of eq. (16) are estimated and these estimates are then 
used as the 0/s in eq. (14) in an attempt to satisfy Condition 2. In the 
ES algorithm no such attempt is made. In practice, the forecasters 
using ES choose some fixed values for the 0,, which satisfy Condition 
1 [or even more restrictively, e.g., eq. (2d)]. These values are based on 
intuition, experience, and familiarity with the data they forecast. 

However, considerable differences between the underlying 0/s and 
the chosen 0/s can result in significant performance degradation. This 
is demonstrated in Fig. 2 for the case M = 2. The MSE for this case 
was computed in a closed form as a function of 0i and 2 for some 
fixed 0] and 2 and graphed in the figure. Together with phenomena 
like nonstationary discontinuity* and changes in the data-generating 
process (i.e., the 0; change values), this resulted in unsatisfactory 
performance of the ES. The realization of what may cause this poor 
performance brought about the idea of using adaptive schemes where 



* Step-like changes in the data. 
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MSE - MEAN SQUARED ERROR 



M 



Fig. 2— The mean squared error as a function^of the data-generating parameters for 
= 2. (The smoothing coefficients are fixed at 0i = —0.3, 8 2 = —0.3.) 



the 6j are not fixed but are adjusted in an attempt to improve perform- 



ance. 



Compared to the existing Adaptive Exponential Smoothing (AES) 
techniques (see, e.g., Refs. 8 through 11), the new technique we 
introduce here is analytically more sound and there are strong indi- 
cations that it converges to opptimal performance in the MSE sense 
for the data approximated by (16). 

This new technique is based on the gradient search for the minimum 
of the MSE. If the MSE would have been available as a function of 
the Oj, then one could compute the gradient 



V = 



dE\e 2 (t)\ 

ae 



(18) 



where = [0\, 2 , • • • , 0m] t > and recursively update the 0, through 

B{t + 1) = Ht) ~ /*V, (19) 

where n > is the adaptation constant. This is the gradient search 
technique, sometimes referred to as the steepest descent technique. In 
general, however, the MSE is not available as a function of the 0,; 
hence, neither is the gradient. Instead, we use an instantaneous 
estimate of this gradient. To get this estimate we replace E\e 2 (t)\ by 
e 2 {t) and the gradient by 



*_«*»_*<„*« 



dd 



SO 



(20) 
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Let us denote 

de(t) 



s(t) = 



68 



as the "sensitivity vector," since it gives an indication of how sensitive 
the error e(t) is to the values of Bj. 

While s(t) is not available we can use eq. (17) to develop a means 
for generating it. Let us take partial derivatives of both sides of this 
equation with respect to 6. Since the right-hand side does not depend 
explicitly on 6 we get: 

M „ \ 
1 - 2 OiD 1 - 1 ) sj(t) = D'-'eit) j = 1, 2, ■ • • , M. (21) 

At this point we are ready to introduce the Adaptive Gradient 
Exponential Smoothing (AGES) technique. Combining eqs. (14), (19), 
(20), and (21) we get: 

the forecast: 

M 

x(t + 1) = A(D)x(t) - 2 Bi(t)e(t - i + 1) (22) 

[see definition of A(D) in eq. (13)], 
the sensitivity functions : 

M 

Sj(t + 1) = 2 h(t)sj(t - i + 1) + e(t -j+1) 
i-i 

7 = 1,2,... ,M, (23) 

and the coefficient adjustments: 

Bj(t + 1) = 9j(t) - 2ne(t)sj(t) j = 1, 2, • • • , M. (24) 

Recall that the error e(t) = x(t) - x(t). 

Both our simulations and our experiments (as described in the next 
section) strongly indicate that AGES converges to optimal perform- 
ance through convergence of Bj(t) to Bj. Namely, the error e(t) is 
adaptively whitened. Despite these indications, since the resulting 
equations are quite complex, a global proof of convergence of the 
AGES technique is beyond the scope of this paper. However, we 
conclude this section by treating the special case M = 1 and show 
local convergence properties for it. 

Let M = 1; then eqs. (17), (23), and (24) become 

e(t + 1) = B x (t)e(t) + e(t + 1) - lC (t) 
Sl (t + 1) = b.(t)8dt) + e{t), 
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and 

6(t + D = k(t) - 2fie(t)si(t). 

Assuming 6j{t) is independent of e{t) and sdt) (similar assumptions 
are common in convergence proofs of adaptive filters) and observing 
that E[e(t)-e(t)\ = <r 2 , E{e(t)-e(t + 1)} = 0, E\s(t)-e(t)) = we get 

E\e 2 (t + 1)} = E\e\{t)\-E\e 2 {t)\ 

+ c 2 [l + e\ - 2» x E[k(t)\] 

E\ Sl (t + \)-e{t + 1)} = E\di(t)}-E[ Sl (t).e(t)\ 

+ E\k{t)\.E\e 2 {t)\ -ftcV 

E\Ut + 1)} = E\k(t)\ - 2nE\ Sl (t)e(t)}. (25) 

If we assume in addition that k(t) has a small variance, namely 
E\d\(t)\ = [E{k(t)\] 2 (the simulation results tend to support this 
assumption), defining 

yi (t) = E\e\t)\ - a 2 

y 2 (t)=E{ Sl {t)-e(t)\ 

y 3 (t) = E\k(t)\ - 0i (26) 

and substituting in (25) results in 

7itt + 1) = Mt) + OifyM + a 2 [y 3 (t)] 2 

y 2 ( t + 1) = [y 3 (t) + ej 2 72(*) + Mt) + 0ihi(*) + a 2 y 3 (t) 

y 3 (t + 1) = 73(0 - 2fiy 2 {t). (27) 

Clearly, if we could prove that y^t), y 2 {t), and 7 3 (t) converge to the 
origin globally (i.e., independent of the initial values), it would mean 
that [see eq. (26)] the MSE converges to the minimum a 2 and £{0i(t)} 
converges to B\. However, despite strong indications from our simula- 
tions that these variables do converge globally, we can prove only local 
convergence. In addition, the proof provides an indication as to how 
to choose the parameter n. 

Let us linearize eq. (27) around the origin to get 

7l (t + 1) = 0?7i(t) 

y 2 (t + 1) = 6h 2 (t) + 0i7i(t) + cr a 7a(*) 

y 3 (t + 1) = 7a(0 " 2fiy 2 (t). (28) 
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The coefficients matrix is 



A = 



d\ ( 

0! dj 

LO -2/* lJ 



and to ensure convergence all eigenvalues of A must be within the unit 
circle. The eigenvalues of A are 

x 2 ,3 = i/2{i + e\± [(i - 0?) 2 - W] 1/2 K 

and it can be verified that choosing 



M< 



(29) 



will guarantee the convergence of eq. (28). 

Condition (29) implies that if 1 0i | is close to one, n must be chosen 
very small and the convergence will be slow. Again, our simulation 
experiments verified this observation. 

IV. SIMULATION RESULTS 

We divide our experiments with AGES into two parts. In the first 
part we applied both ES and AGES on data generated by the computer 
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Fig. 3 — Comparison of forecasting performance between ES (/3 = 0.8) and AGES. 
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and compared the results. In the second part we applied the AGES to 
real data that we took from Ref. 7. 

Equation (16) was used to generate data of type S, LT, and LSM by 
the computer. The results of applying both ES and AGES on these 
data are presented in Figs. 3, 4, and 5 and in Table I. Each point on 
the curves of Fig. 3 corresponds to a complete run on a sequence of 
data generated with the particular choice of the 0,. The resulting MSE 
for the ES and the AGES forecasts are presented and the comparison 
clearly indicates the superiority of the AGES algorithm. In addition, 
we observe that the AGES results, in almost all the runs, in a MSE 
very close to the minimum, a 2 . 

In Fig. 5, we followed the variation of the 0,(£) with time in a number 
of runs. The results clearly show that the 0,(£) converge to the 0, from 
a variety of initial values; this indicates global convergence properties. 
Similar results are observed in Table I for data with seasonal multi- 
plicative effects and linear trend. The t (t) clearly converge to the 0,'s, 
and the MSE, when AGES is applied, is again very close to the optimal 
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Fig. 4— Comparison of mean squared error in forecasting with ES (0i = 2 = -0.3) 
and AGES as a function of the data-generating parameters 0, and 2 . (a) 2 = -0.9. 
(b) 2 = -0.6. 
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-Convergence of: (a) 0i(£) to the optimal value 0i in the AGES method, (b) Odt) 
from various initial conditions to 8 t and 6 2 , using AGES on data with linear 



value, a 2 . From Ref. 7 we took data of the simple kind (no linear trend 
or seasonal effects): The IBM common stock closing prices, daily, 
from May 17, 1961 through November 2, 1962. On the data we applied 
both ES and AGES and the results are presented in Fig. 6. Each point 
on the curves corresponds to a run on the same data, each time with 
a different coefficient (for the ES) and different initial condition (for 
the AGES). The further the coefficient used in the ES is from 0i 
(which in this case is equal to -0.1, as indicated in Ref. 7), the better 
the performance is for AGES. 

Further experiments were conducted on monthly international air- 
line passengers data. 7 These data, as Fig. 7 indicates, are with linear 
trend and multiplicative seasonal effects. We applied the AGES algo- 
rithm (with M = 3) and the results are presented in Fig. 8. In Ref. 7 
it is claimed that sometimes rather than work with the actual data it 
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Table I— Comparison of MSE in forecasting with ES (0, = -0.2, 2 
0.5, fl 3 = 0-4) and AGES 

Data-Generating Coefficients (and Adaptive Coefficients) 



MSE (Xa 2 ) 



0i («.)" 



2 (0 2 ) 



03 (03) 



AGES 



ES 



1.4 ( 


1.39) 


-1.3 ( 


-1.29) 


0.8 (0.75) 


1.1972 


11.7155 


2.1 ( 


1.97) 


-1.95 ( 


-1.79) 


0.8 (0.68) 


1.3831 


20.5821 


0.75 ( 


0.69) 


-0.6 ( 


-0.56) 


0.8 (0.78) 


1.0749 


5.1431 


0.6 ( 


0.6) 


-0.75 ( 


-0.76) 


0.8 (0.77) 


1.0600 


5.0256 


-0.75 ( 


-0.73) 


0.6 ( 


0.6) 


0.8 (0.79) 


1.0663 


1.3577 


0.0 ( 


-0.04) 


0.0 ( 


0.01) 


0.0 (0.03) 


1.0040 


1.5737 


1.0 ( 


0.98) 


-1.0 ( 


-0.99) 


1.0 (0.94) 


1.2001 


8.6414 


-0.2 ( 


-0.16) 


0.5 ( 


0.49) 


0.4 (0.4) 


1.0397 


1.0137 


-0.1 ( 


-0.09) 


0.25 ( 


0.26) 


0.4 (0.41) 


0.9973 


1.0839 


1.2 ( 


1.19) 


-0.9 ( 


-0.81) 


0.4 (0.36) 


1.1044 


7.1069 


1.8 ( 


1.74) 


-1.35 ( 


-1.24) 


0.4 (0.34) 


1.2192 


13.0215 


0.3 ( 


0.31) 


-0.75 ( 


-0.75) 


0.4 (0.38) 


1.0574 


3.8630 


1.0 ( 


0.96) 


-0.5 ( 


-0.48) 


0.0 (-0.04) 


1.0605 


4.2218 


1.5 ( 


1.42) 


-0.75 ( 


-0.66) 


0.0 (-0.05) 


1.1165 


6.9455 


0.75 ( 


0.77) 


0.0 ( 


0.03) 


0.0 (0.03) 


1.0212 


2.4687 


0.0 ( 


-0.03) 


-0.75 ( 


-0.74) 


0.0 (0.02) 


1.0314 


3.5907 


0.2 ( 


0.19) 


0.5 ( 


0.53) 


-0.4 (-0.43) 


1.0186 


1.7692 


1.2 ( 


1.19) 


-0.15 ( 


-0.14) 


-0.4 (-0.39) 


1.1115 


4.0338 


-1.8 ( 


-1.7) 


-1.35 ( 


-1.22) 


-0.4 (-0.36) 


1.2077 


13.8494 


-0.3 ( 


-0.3) 


-0.75 ( 


-0.76) 


-0.4 (-0.39) 


1.0438 


4.6957 


-0.75 ( 


-0.79) 


-0.3 ( 


-0.28) 


-0.4 (-0.38) 


1.0062 


3.9623 


0.4 ( 


0.43) 


0.5 ( 


0.49) 


-0.8 (-0.78) 


1.0461 


2.6126 


-0.7 ( 


-0.73) 


-0.65 ( 


-0.62) 


-0.8 (-0.8) 


1.0670 


6.1628 


-0.6 ( 


-0.61) 


-0.75 ( 


-0.75) 


-0.8 (-0.79) 


1.0815 


6.8677 


-0.75 ( 


-0.74) 


-0.6 ( 


-0.56) 


-0.8 (-0.74) 


1.1065 


7.0182 


-0.5 ( 


-0.52) 


-0.4 ( 


-0.38) 


-0.8 (-0.81) 


1.0759 


5.1676 



The values to which 0,(t) converge are given in parentheses. 



is more convenient to work with the logarithm of the data. As we 
argue in Appendix B, these logarithms, as data, have linear trend and 
additive seasonal effects (see Fig. 8). Hence, on the logarithms we 
applied AGES for linear trend and additive seasonal effects and the 
results are presented in Fig. 8a (M = 3). We used the same data (the 
logarithms) to see whether the performance improves with larger M. 
AGES was applied with M = 13 and the results, as presented in Fig. 
8b, clearly indicate that for this data M = 3 was sufficient. 

V. CONCLUSIONS 

In this paper we have introduced a new forecasting technique, 
Adaptive Gradient Exponential Smoothing (AGES), which is based 
on Exponential Smoothing (ES). We have elaborated on the optimality 
properties in the MSE sense of the ES. For certain types of data, the 
ES can result in optimal performance provided some coefficients are 
known. In general, these coefficients are unavailable, and the AGES 
shows strong indications of converging to these unknown coefficients 
and providing optimal performance. 
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Fig. 6— Comparison of performance of forecasting with the ES (varying the coefficient 
in each run) and the AGES methods. 
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Forecasting with AGES international airline passengers (M = 3). (Note that 
have linear trend and multiplicative seasonal effects.) 



Clearly, more extensive experiments and practical use of the pro- 
posed forecasting technique, the AGES, are required. A user-friendly 
software package can be developed for implementation of this tech- 
nique if sufficient interest is generated. 
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Fig. 8— Forecasting with AGES the logarithm of the data in Fig. 7 for: (a) M = 3. 
(b) M = 13. (Note that the logarithm of the data in Fig. 7 has the form of data with 
additive seasonal effects and linear trend.) 
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APPENDIX A 

Necessity of Conditions 1 and 2 for the Convergence of e(t) to «(t) in 
Equation (17) 

Condition 1 is clearly necessary (as well as sufficient) for the 
convergence of E\e 2 (t)\ to a finite value. We want to show that 
Condition 2 is necessary for E\e 2 (t)\ to converge to a 2 . 

Let 



and 

Clearly, 



y ee (r) =E[e(t)-e(t-r)\ 
y ee ( T ) = E[e(t)e(t - t)}. 



7«e(r) = Jeei—r) 
and, from eq. (17) and the definition of e(t) 

7«(-t) = 0. 



(30) 
(31) 
(32) 
(33) 



With these definitions it follows from eq. (17), after transients die, 
that 

7««tt = a 2 

7e,(D - £l7ee(0) = -0,<T 2 

7«(2) - §i7«(D " £ 2 7*<(0) = -e 2 a 2 



y et (M) - daeXM - 1) - 
or in matrix form 



— 0A/7e< — —QmO , 



1 
-0] 



—6m —Om-i 



■■ 
■ - 
•• 




f7 e «(0) *] 

7e,(D 
_7e«(M)_ 


= -a 2 


-1 

02 

0M_ 



(34) 
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Also, 



7e C (0) - 0i7«(l) - 2 7 cc (2) " B M y ee (M) 

= 7«(0) - tfi7«(D OuyeAM) 

7«(D " »17«(0) " 027«(D - • • • " ^M7ee(M - 1) 
= -0i7«(O) " - • • ~ ^7e«(M " 1) 

y ee (2) - kyM) - &7«(0) ^Af7 C e(M - 2) 

= -^27 CC (0) - • • • - 0m7«(M - 2) 



7ee (M) - fli7«(M - 1) - d 2 y e e(M - 2) - 

= -0 W 7"(O) 

or again in a matrix form 



- 0m7~(O) 



1 





• 


• • 


-01 


i 








-h 


-0i 


1 






'M ~VM-1 



d M 




0j 


02 ' 


0A/-2 


02 


03 • 


0A/-1 


03 


04 • 


0JW 



0M-1 0M 
0Af 





7«(0) 
7-(D 

7 ee (2) 



7«(M) 
|_7~(M) 



1 -< 



'Af 



■d a 







0Af 





r 7«(o)i 

7«r(D 

7ee(2) 





_ 7ee(3)_ 



(35) 
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Now, if we claim that e{t) converges to e (t), it means that 
7«(r) = 8(r)c 2 = 



f<r 2 for t = 
[0 for t ¥= 



and 

Jeeir) = 8(t)<T 2 . 

Then, substituting this in (34) or (35) results in 

Bj = Oj for j = 1, 2, • • • , M, 
which is Condition 2. Hence this condition is necessary as claimed. 

APPENDIX B 

Possible Transformation of Multiplicative Seasonal Effects Into Additive 
Seasonal Effects 

Suppose we are given data of the noise-free form 
y(t) = (a + bt)c(t) 



c(t + L) = c(t) ' ■ m 

which is with linear trend and multiplicative seasonal effects. 
Let 

z(t) = Log[y(t)]. (37) 

Then substitution of (36) gives (if we assume bt <K a, which is true in 
most real data): 

z(t) = log a + log ( 1 + - t) + log c(t) 

« log a + - t + log c(t) 
a 



« a + bt + c(t), (38) 



where 



a = log a 

a 

c(t) = log c(*). 

Hence z{t) clearly has the form of data with linear trend and additive 
seasonal effects. 
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